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Summary 


:( 

A  statistical  sensitivity  analysis  may  be  defined  and  performed 
in  terms  of  the  response  of  a  vector  of  parameter  estimates  to  variation 
in  the  way  sample  information  is  processed  vis-a-vis  to  tentative  under¬ 
lying  model.  The  mode  of  information  processing  is  a  generalization 
of  likelihood  and  is  indexed  on  a  non-statistical  parameter  c.  The  case 
c=0  corresponds  to  maximum  likelihood.  If  the  vector  of  parameter 
estimates  is  stable  under  moderate  increase  of  the  index  c  from  0,  the 
tentative  model  and  the  data  are  internally  consistent.  A  general 
procedure  for  the  conduct  of  such  sensitivity  analyses  is  given  along 
with  several  illustrations.  For  fixed,  positive  values  of  the  index,  one 
obtains  a  general  robust  estimation  procedure.  ( 


Key  Words:  sensitivity  analysis,  robust  procedure,  Gaussian,  logistic, 
Weibull,  extreme  value,  Poisson,  generalized  likelihood 


1.  Introduction 


Sensitivity  analyses  are  routinely  and  usefully  performed  in  the 
engineering  disciplines,  in  operational  research,  to  name  just  a  few. 

The  objective  of  a  sensitivity  analysis  is  to  determine  the  response  of 
a  solution  to  changes  in  the  assumptions,  to  changes  in  the  data,  or  to 
changes  in  the  information  that  are  used  in  modeling  representations  of 
reality.  Also  within  the  purview  of  a  sensitivity  analysis  is  the  deter¬ 
mination  of  responses  of  a  solution  to  changes  in  the  way  in  which 
information  is  processed.  If  small  or  moderate  perturbations  in  the 
assumptions,  data,  processing,  etc.  produce  large  changes  in  a  solution, 
then  valuable  information  has  been  provided  for  the  analyst  or  the  modeler 
since  certain  facets  of  the  model  or  of  the  information  are  critical  or 
require  further  attention. 

Residual  analysis,  jackknife  procedures ,  and  robust  procedures 
provide  several  ways  in  which  a  sensitivity  analysis  may  be  performed  in 
a  statistical  setting.  In  this  setting  the  data  and  a  tentatively  assumed 
model  should  be  considered  as  a  single  entity.  The  objective  of  a  sensi¬ 
tivity  analysis  is  to  determine  whether  the  data  and  the  tentatively 
assumed  model  are  internally  consistent  and  this  may  be  effected  by  con¬ 
trolling  the  way  information  provided  by  the  data  is  incorporated  in  the 
evaluation  of  model  parameters.  A  sensitivity  analysis  can  also  he  most 
useful  in  model  evolution.  We  propose  herein  a  general  procedure  for 
performing  a  sensitivity  analysis  of  a  data-model  unit  and,  secondarily, 
a  general  procedure  for  producing  robust  estimators  of  location,  scale, 
shape,  etc.  parameters. 
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There  is  by  now  an  extensive  literature  on  robust  methods  for 
location  problems.  Barnett  and  Lewis  (1978),  Huber  '1981),  and  Rey  (1977) 
provide  useful  summaries  of  currently  available  robust  methods.  There 
are  few  papers  on  construction  of  robust  estimators  for  non-location 
parameters  and  asymmetric  distributions.  This  paper  provides  such  a 
construction.  It  is  based  on  what  is  believed  to  be  a  new  generalization 
of  the  likelihood  or  Shannon's  information.  The  procedures  we  propose 
are  easily  implemented  and  are  readily  applied  in  a  modeling  or  structured 
data  framework.  Several  examples  are  provided.  The  Gaussian,  Weibull, 
gamma,  logistic,  and  Poisson  distributions  are  explicit!  considered. 

2.  Construction  of  a  Family  of  Estimators 
Let  x1,x2,...,xn  be  a  random  sample  from  the  normal  density 

n(x;y,o2)  =  (2TT02)15  exp(-  h  (-^p)2).  (2.1) 

The  log  likelihood  for  the  location  parameter  y ,  a  assumed  momentarily 
to  be  known,  is 

L(y )  =  l  log  f(x.  ;y,o  )  =  -  -?  log  2ir  =  -  j  log  a  -  h  l  (-“r— )  • 

j=l  J  *  L  j=i  ' 

(2.2) 

The  maximum  likelihood  estimator  of  y  is  determined  by  minimizing  the 
quadratic  form 

*5  l  (x.-y)2  (2.3) 

j=l  1 

with  respect  to  y.  In  a  seminal  paper  Huber  (1964)  suggested  that  y 
be  estimated  by  replacing  the  quadratic  in  (2.3)  by 
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n 

h  l  p(x.-Vl),  (2.4) 

1=1  : 

where  p(*)  is  a  convex  function  which  increases  less  rapidly  than  a 
quadratic;  in  particular  the  choice 


p(u) 


h  U2  ,  |u|s  K 

k|u|  -  h  2,  1 uj  >  K 


(2.5) 


is  optimal  in  a  minimax  sense.  Thus  (2.4)  represents  a  change  (from 

(2.3))  in  the  way  the  information  provided  by  the  x..  for  the  model 
2 

n(x;yi,cr  )  of  (2.1)  is  processed. 

Whereas  Huber  recommended  perturbing  the  quadratic  form  of  (2.3), 
we  shall  perturb  the  negative  of  the  likelihood  or  the  Shannon  infor¬ 
mation 

n 

I  =  -  l  log  n(x.;p,o*).  (2.6) 

1*1  1 

The  more  surprising  an  item  of  information,  that  is  an  x^  which  must  be 

2 

extreme,  the  larger  the  information  -  log  n(x^ ;y,o  ).  See  Barnett  and 
Lewis  ( 19 ■/ 8 ,  Chapter  9).  The  information  (2.6)  is  unbounded.  If 
difficulties  with  the  data-model  entity  are  expected,  it  is  natural  to 
curb  the  information  since  it  is  unbounded.  Figure  1  gives  a  plot  of 
-  log  n(x;0,l)  labeled  c=0,  a  single  terra  of  I  in  (2.6).  We  do  not 
advocate  discarding  the  large  information  items  but  rather  wish  to  partition 
the  information  so  that  we  can  find  and  isolate  the  surprising  items  and 
call  attention  to  them  for  further  study.  The  isolation  of  surprising 
items  is  not  especially  difficult  when  structure  is  not  involved;  graph¬ 
ical  techniques,  for  example,  are  especially  informative.  However,  when 


and  several  values  of  c 
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structure  is  involved,  the  matter  5j  no  longer  simple  and  powerful  tools 
may  be  needed. 

How  should  the  likelihood  be  perturbed  in  order  to  produce  procedures 
for  sensitivity  analysis  and  robust  estimation? 

We  first  address  the  problem  of  a  general  density  and  return  to 
the  Gaussian  case  in  the  next  section.  Let  x^,X2»*-«>xn  be  now  a  random 
sample  from  an  absolutely  continuous  density  f(x|6)  whose  interval  of 
support  is  non- trivially  -®<x<«  and  where,  for  concreteness  of  discussion 
ant  without  loss  of  generality,  9  is  taken  to  be  a  scalar  taking  on  values 
in  some  open  set  0.  Suppose  further  that  f(xje)  possesses  sufficient 
regularity  to  permit  the  standard  maximum  likelihood  operations  (Kendall 
and  Stuart,  1966,  Vol.  II,  Ch.  18).  For  each  i,  i=i,2,...,n 


j*  f(x.jje)  <1x^=1  (2.7) 

—00 

On  taking  partial  derivatives  of  both  sides  of  (2.7)  with  respect  to  9 
we  find 


.  3  log  f ( x • 1 0 )  . 

fUjIe)  dx.  .  j  | - 5H— |  fUile) 


dx.  =0 
l 


(2.8) 


,3  log  ftxje) > 

The  equation  (2.8)  implies  that  E  ( - ^ - )  =  0.  Further  an 

estimator  §,  3ay,  of  8  may  be  obtained  from  the  zeros  of 

n  3  log  f(x.  |0) 

l  - rx— * - =0  (2.9) 

i=l  30 


which  is  arrived  at  on  summing  the  quantity  in  brackets  {•}  in  (2.8) 
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over  i  and  setting  the  resulting  expression  to  zero.  This  expression 
coincides  with  the  estimating  equation  derived  from  differentiating  the 
log  likelihood 

n  n  fc(.x.  |e)-l 

£n(e)  =  l  log  f(.x.  |e)  =  lim  y  - - -  (2.10) 

0  i=l  1  c->0  i=l  c 

with  respect  to  9. 

Now  the  likelihood  is  essentially  a  geometric  mean  which  is  in 
turn  a  special  case  of  the  generalized  mean 


M(9,c)  =  (j 


1  fC<»,|9»:/C, 

_  A  «*■ 


(2.11) 


This  suggests  that  it  may  be  useful  to  determine  estimators  of  9  which 
make  use  of  the  density  raised  to  the  cth  power.  There  is  some  evidence 
for  such  an  approach  in  the  work  of  Paulson  and  Nicklin  (1981)  involving 
estimators  derived  from  distances  between  characteristic  functions. 

Under  very  mild  regularity  conditions  on  the  density  f ( x^ 1 9 )  there  exists 
a  function  Q(9;c)  such  that 


(xi|9)  =  Q(9;c) 


for  some  -k^<c<k2>  k^,k2>0.  Then 


f1+C(xi!9) 

J  Q( 9 ;c)  dxi  ^ 


1, 


(2.12) 


(2.13) 


and 
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36 


r 


f1+c(xi|e) 

Q(Q;c) 


r  i  f . 

J 


31og  fi 
36 


^1+c 

i 


log  Q' 
36 


dx.=0, 


(2.14) 


on  interchange  of  differentiation  and  integration.  The  arguments  of 
f(xi|e)  and  Q(6;c)  have  been  deleted  in  (2.14)  for  notational  convenience 
and  this  will  often  be  done  where  there  is  no  danger  of  misinterpretation. 
From  (2.14)  we  find 


ft# 


Slog  f. 


(2.15) 


We  thus  choose  as  our  estimating  equation  for  8, 


n  .  3  log  f(x.|e)  3  log  Q(9;c)  , 

f  1  (1*=>  — sH - 5e — r°.  <2-16) 

the  quantity  in  {•}  in  (2.15),  indexed  on  i  and  summed  over  i.  This  is 
exactly  analogous  to  the  way  in  which  (2.9)  was  determined  to  be  the 
maximum  likelihood  estimator  of  6.  Observe  that  Q(8;0)  =  1  and  that 
(2.16)  reduces  to  the  usual  likelihood  equation.  The  maximum  likelihood 
estimating  equation  (2.9)  was  developed  by  means  of  score  function  argu¬ 
ments.  Equation  (2.16)  is  also  developed  by  means  of  score  function 
arguments.  A  bona  fide  objective  function  which  gives  rise  to  (2.16) 
would  be  of  considerable  practical  and  theoretical  importance.  If  we 
regard  (2.16)  as  a  differential  equation  we  find  that  the  corresponding 
objective  function  is  given  by 


ye) 


l 

c 


fC(xi|6) 

(Q(8;c))c/<ltc) 


(2.17) 
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as  may  be  verified  by  differentiation  with  respect  to  6.  When  c-*0  in 

(2.17) ,  ^C(S)  -*--£g(9),  the  log  likelihood.  It  is  interesting  to  note  that 
£c<0)  does  not  coincide  with  the  right-most  side  of  (2.10). 

Perhaps  a  comment  on  what  we  are  not  doing  is  merited  here.  We 
are  not  introducing  a  new  density  which  is  now  a  function  of  the  original 
parameter  9  and  a  new  statistical  parameter  c.  It  is  helpful  to  view 
c  as  an  index  of  how  information  is  to  be  processed  and  which  is  entirely  at 
the  data  analyst's  disposal.  The  objective  is  still  the  estimation  of 
the  parameter  9  of  the  assumed  (and  tentative)  density  f(x[9)  based  on 
the  sample  data  xi>x2*"*,xn  but  now  we  are  interested  in  separating  out 
surprising  items  by  utilizing  in  the  estimation  process  different 
measures  of  information,  namely  those  indexed  on  c  and  given  in  (2.17). 

For  example.  Figure  1  provides  plots  of 

9  -  1  I  nc(x|o ,1)  I 

CX  C  '  [Q(0,l;c)]c/(1+c)  ' 

with  c=0,  .1,  .25,  .5,  1.  This  measure  of  information  is  bounded  for  c>0 

but  is  unbounded  for  c£0.  Increasingly  greater  weight  is  given  to  tail 

observations  as  c  decreases  from  zero. 

Apart  from  the  case  c=0,  an  interesting  special  case  is  provided 

n  , 

by  c=l  for  which  9  is  estimated  by  maximizing  ][  f(x.  |9)/Cr(9;l).  No 

i=l  1 

other  special  case  seems  to  be  of  articular  interest.  An  estimator  §c 
may  be  determined  as  that  value  of  9  which  maximizes  9 ) .  Equation 

(2.17)  holds  under  quite  general  conditions.  For  example,  9  and  the  x^ 
need  not  be  restricted  to  scalar  values,  and  the  density  f  need  not  be 
absolutely  continuous. 


The  above  construction  provides  a  means  by  which  we  can  determine 
the  response  of  a  solution  to  changes  in  the  way  the  information  is  pro¬ 
cessed.  In  particular,  the  quantity 


e  -  0  , 

c  c* 


c-c' 

will  provide  a  qualitative  measure  of  the  sensitivity  of  response  of  the 
estimator  §c  to  changes  in  the  user-specified  index  c.  The  values  c>0, 
c'-O  are  especially  interesting.  Surprising  items  are  not  permitted  to 
exert  a  large  influence  on  the  information  measure  under  c>0  and  thus  ? 
not  permitted  to  -exert  a  large  influence  on  the  parameter  estimates . 

When  the  interval  of  support  of  the  density  f(x|0)  is  infinite  on  both 
the  left  and  the  right  then  surprising  items  will  be  identified  by  a 
low  weight  fc(x|0)/Q(0;c)  in  (2.16)  (see  also  (2.15)).  Indeed,  the 
pattern  of  the  weights  fc(xi|0)/Q(0;c)  provides  very  useful  diagnostic 
information.  We  now  examine  some  special  cases. 


3.  The  Gaussian  Distribution 

The  most  important  error  model  in  statistics  is  provided  by  the 
Gaussian  distribution  given  in  (2.1).  Suppose  c>0  is  fixed.  We  shall 
derive  robust  estimators  for  y  and  a  from  (2.17).  We  find  by  straight¬ 
forward  integration  that 

j*  n1+C  (x|y,o2)dx  =  C(1+c)(2tt02)c]  **  =  Q(y,o2;c).  (3. 

_0D 

2  . 

Let  l  (u,o  )  be  the  two-parameter  analogue  of  (2.17).  Differentiation 
c 
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(Asymptotic  expressions  are  given  as  we  proceed  since  they  will  usually 
be  more  naturally  presented  in  this  manner.  Details  are  given  in  section 
7  •)  Thus  uc  and  are  asymptotically  independently  distributed.  The 
value  c  =  is  a  singularity  of  the  covariance  matrix  V  and  only  values 
-%<c«»  are  permissable.  The  asymptotic  efficiencies  for  yc  and  5^ 

A  A  2 

relative  to  yQ  and  3*  are  readily  computed  from  V  and  are  given  in  Table  1. 


Table  1 
*  ~2 

Efficiencies  of  the  Estimators  y  ,  o  for  Selected  Values  of  c 

c  c 


Estimators 

-.3 

-.2 

-.1 

0 

.1 

.2 

.3 

.4 

.5 

1.0 

5c 

.738 

.908 

.982 

1 

.988 

.959 

.921 

.880 

.838 

.650 

S2 

.631 

.825 

.964 

1 

.975 

.919 

.850 

.777 

.706 

.433 

c 


The  asymptotic  efficiencies  remain  high  over  a  broad  range  of  values  of 

A  a2 

c.  The  influence  function  for  y  and  a  at  the  Gaussian  distribution  with 

c  c 

2 

mean  y  and  variance  a  are  proportional  to  the  score  functions  (Huber, 
1981,  p.  45)  determined  from  (3.2)  and  (3.3)  respectively;  that  is 


IC(y  ;y,N)  *  (y-y)  fc(y|y,a  ), 


IC(S^;y,N)  «  {(l+cXy-y)^  -  a*}  fc(y|y,a^). 

Both  influence  functions  are  bounded  and  re-descendent  to  zero  for  all 

A  A  2 

c>0.  Both  estimators  yc  and  5^  have  high  breakdown  bounds  for  sample 
sizes  n«sl0. 

The  data  presented  in  Table  2  are  taken  from  David  and  Quesenberry 


.2,  .cC/ 


(3.8) 

(3.9) 


(1961)  and  are  tentatively  from  a  Gaussian  population.  Also  presented 
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in  Table  2  are  the  final  weights  v^c  for  c=0,  .3,  .5.  As  c  increases 
the  weights  v^  ,  i=14,15,16  decrease  dramatically  indicating  that  these 
observations  and  the  single  Gaussian  parent  assumption  may  not  be  mutually 
consistent.  Of  course  tail  observations  will  be  weighted  lower  as  c 
increases  even  if  the  data  were  Gaussian  but  not  to  the  extent  seen  here. 
The  rate  of  change  (v^4  3  -  v^4  Q )/( . 3-0 )  =  -1.02  is  substantial  but  it 

is  not  known  if  this  is  satatistically  significant.  Resolution  of  the 
distribution  of  the  rate  of  change  statistic  seems  to  be  a  difficult  prob¬ 
lem  but  one  worth  some  attention  and  is  perhaps  best  addressed  by  simu¬ 
lation.  The  parameter  estimates  pc  and  3c  are  presented  in  Table  3.  As 
c  increases  from  0  the  rate  of  change  of  vc  and  ac  is  large.  However, 
when  c  is  approximately  .85,  ac  no  longer  decreases  with  c  but  begins  to 
increase.  This  behavior  is  typical  of  the  behavior  of  <jc  when  the  data  are 
indeed  Gaussian.  The  estimators  uc  and  oc  should  be  approximately  inde¬ 
pendent  if  the  data  were  from  a  Gaussian  population.  When  c  increases 
from  0  to  .2,  the  estimated  asymptotic  correlation  p  increases  from  0  to 
.88.  However,  when  c  =  .85  this  estimated  correlation  is  approximately 
zero  while  for  c>.85  the  correlation  becomes  negative. 

This  example  captures  several  aspects  of  the  procedure  which  hold 
generally  for  Gaussian  error  models  with  or  without  structure.  First, 

the  weights  v.  can  be  advantageously  used  to  determine  the  extent  to 
1  c 

which  data  and  model  are  internally  consistent.  Observations  which  receive 

low  weights  are  prime  candidates  for  further  study.  Second,  one  might 

2 

expect  an  estimate  of  a  computed  from  (3.5)  to  decrease  with  increasing 

2 

c.  This  is  not  the  case.  When  the  data  are  Gaussian  the  estimate  of  a 


Table  2 


12 


will  remain  roughly  constant  or  increase  slightly  as  c  is  increased. 

The  estimate  of  u  also  remains  roughly  constant  as  c  increases.  Numerical 
and  sample  size  considerations  determine  the  magnitude  which  c  may  take. 

We  typically  find  OScSl  most  useful  for  the  Gaussian  distribution  although 
we  have  made  use  of  values  of  c>l  in  practical  settings.  Apart  from 
numerical  difficulties  -*■  max{f(x^)},  i.e.  the  mode,  as  c  becomes  large. 
Third,  the  estimated  asymptotic  correlation  provides  a  useful  diagnostic 
in  a  data  analysis.  These  three  comments  are  based  on  extensive  practical 
and  simulation  experience. 


4.  Other  Distributions 

We  now  show  that  (2.17)  may  be  used  to  construct  sensitivity 
analyses  and  robust  estimators  for  c  variety  of  distributions  other  than 
the  Gaussian.  It  is  worth  emphasizing  at  this  point  that  sensitivity 
analyses  which  are  concerned  primarily  with  error  models  are  not  of 
primary  importance  because  problems  associated  with  error  models  without 
intervening  structure  are  relatively  easy  to  deal  with.  The  main  interest 
is  in  having  a  procedure  which  is  capable  of  dealing  with  the  combination 
of  error  and  structural  models  since  problems  with  the  model  or  with  the 
data  or  with  both  may  be  very  difficult  to  detect  since,  quite  often,  effi¬ 
cient  estimation  procedures  hide  more  than  they  illuminate.  At  the  heart  of 
these  problems  is  the  modeling  process  itself.  The  ultimate  objective 
is  to  explore  and  uncover  an  appropriate,  hopefully  parsimonious,  model 


Table  3 


Parameter 

■  Estimates 

0  and  o  and 
c  c 

Estimated 

Asymptotic 

Correlation  p 

c 

A 

P 

A 

a 

P 

0.0 

.5189 

.189 

0 

0.2 

.4786 

.141 

.88 

0.3 

.4628 

.116 

.83 

0.5 

.4456 

.091 

.56 

0.6 

.4427 

.087 

.30 

0.7 

.4415 

.085 

.15 

0.8 

.4410 

.085 

.05 

0.85 

.4409 

.086 

.01 

0.9 

.4408 

.086 

-.03 

1.0 

.4407 

.087 

-.09 

1.5 

.4411 

.091 

-.31 

2.0 

.4421 

.094 

-.51 

■0.1 

.5344 

.198 

.78 

0.2 

.5497 

.203 

.78 

0.3 

.5651 

.204 

.77 

0.5 

.5989 

.195 

.77 
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to  describe  a  body  of  data.  However,  once  we  have  in  hand  a  procedure 
for  various  types  of  error  models,  the  extension  to  error-structural  models 
is  not  nearly  as  difficult.  Few  papers  have  dealt  with  error  models  other 
than  the  Gaussian.  Thall  (1979)  has  proposed  a  procedure  for  the  exponen¬ 
tial  distribution  based  on  a  modification  of  Huber's  procedure.  Hampel 
(1968)  has  proposed  a  general  procedure  but  it  does  not  seem  to  be  numer¬ 
ically  viable.  The  following  error  distributions  are  explicitly  considered 
because  they  are  used  in  structured  situations. 

a.  The  Logistic  Distribution.  The  logistic  distribution  is  a 
location  and  scale  family  which  is  not  a  member  of  the  exponential  class 
and  which  is  sometimes  used  in  place  of  the  Gaussian  distribution.  It  has 
density 

l(»|„,T).l  (4.1) 

{1  +  exp((x-w)/t)} 

for  t>0 ,  -<®<u«”.  The  corresponding  distribution  function  is 


T/„|„  .  exp((x-ii)/t) 

L(x'y,T)  ”  1  +  exp((x-p)/x)  * 


(4.2) 


The  integral 


f 


11+C  (x|u,t)  dx 


T  1 


(L(l-L)}  t  dx 


_1_ 

c 


uc(l-u)c  du 


(4.3) 


on  making  the  transform  u  =  L(x|ji,t).  B(1+c,1+c)  denotes  the  complete 
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beta  function  with  arguments  1+c  and  1+c.  Estimators  uc  and  tc  are 
determined  from  maximization  of  the  analogue  £c(y,T)  of  (2.17).  We  find 
with  a  little  effort  that  the  estimators  yc,  rc  jointly  satisfy 

n  /X.-y* 

l  1  (x. |y,t)  tanhl-s— -)  =  0,  (4.4) 

i=l  1  ZT 

and 

n  .  x.-y  /X.-y.. 

I  1°( x^ j y 5 t )  l+(l+c)  ---  tanh  j|  =  0.  (4.5) 

i=l 

The  equations  (4.4)  and  (4.5)  reduce  to  those  of  maximum  likelihood 
when  c=0.  The  score  functions  at  the  logistic  distribution  are  propor¬ 
tional  to  the  influence  functions  and  satisfy 

S(yc;y,l)  =  lC(y|y,t)  tanh  (^4,  (4.6) 

S(ti;y,l)  =  lC(y |y ,t)  (1+(1+c)  ^  tanh  (2g4},  (4.7) 

both  of  which  are  bounded  and  redescend  to  zero.  The  right  hand  side 
of  (4.6)  is  bounded  for  c=0.  The  right  hand  side  of  (4.7)  is,  however, 
not  bounded  for  c=0  and  increases  linearly  in  |y|. 

b.  The  Gamma  Distribution.  The  gamma  distribution  with  scale 
parameter  fj  and  shape  parameter  a  has  density 

a-1  -x/B 

g(x|a,0)  *  -  ,  x  >  0  (4.8) 

r(a)B° 

fora-,  6>0.  This  distribution,  especially  when  a=l,  is  widely  used  as 

a  failure  distribution.  Using  (2.17)  we  find  that  the  estimators  ac, 

8  jointly  satisfy 
c 


15 


l  gC(x^|a,g)  {log  -  log  (-5—-)  ~  <p(o(  1+c) }  =0  (4.9) 

i_l 

n  x. 

£  gC(x.|a,8)  {(1+c)  ~  -  (a(l+c)-c)}  =  0,  (4.10) 

1=1  1  S 

when  xi»x2 » • • • »xn  is  a  random  sample  putatively  from  the  distribution 
(4.11).  Here  41(2)  is  the  digamma  function  with  argument  z.  As  a 
special  case  take  o=l.  Then  the  estimator  for  the  mean  of  an  expo¬ 
nential  distribution  satisfies  the  implicit  equation 


g  =  (1+c) 


=  (1+c) 


t  gc(x. |i,e) 
i=i  1  1 

I  gc(x.|l,g) 
i=l 

-cx./B 

F  x.  e  1 
L  1 _ 

-cx./B 

.  Ie  1 


(4.11) 


Thus  observations  which  are  far  removed  from  g  will  receive  a  low  weight 
-cx; / 6 

e  ~  when  c>Q. 

Eyen  though  the  estimators  ac  and  gc  are  in  a  reasonably  attractive 
form  it  is  numerically  and  statistically  more  appealing  to  make  the 
transformation  y  =  log  x.  The  resulting  density  is 


g*(y|a,<j>)  =  exp{a(y-<|>)  -  exp(y-(j,)},  -«<y<». 


(4.12) 


where  $  =  log  g  plays  the  role  of  a  location  parameter  and  now  a  is 
similar  to  a  scale  parameter.  The  log-gamma  density  is  unimodal  with 
well-behaved  tails.  On  evaluating  Qft(a,$»c)  corresponding  to  (4.12)  and 


substituting  in  (2.17)  we  find  directly  that  the  estimators  $c  and 
satisfy 

n 

l  gj^y^ja,*)  (exp(y.-^)  -  a)  =  0,  (4.13) 

i=l 

n 

l  g*  (y..  |ot,+ )  (y.  -  4>  +  log(l+c)  -  ip( a) }  =  0.  (4.14) 

i=l 

Ci  c/(dn*c) 

In  this  case  gA(y|o,$)/Qft  is  bounded  in  y  for  all  <)>  and  a>0  while 

Cl  c/ { 1 +c ) 

g(x|a,0)/Q  is  not  bounded  for  a<l  and  all  c>0  as  x-K).  When  the 

support  of  the  random  variable  is  nontrivially  on  (-»,»)  the  information 

quantity  is  always  bounded  when  c>0. 

In  the  special  case  when  o=l  and  assumed  known  and  only  6  is  being 

estimated  the  estimator  $  satisfies  the  implicit  relationship 

c 


n  -CX./8 

r  1+C  1 

)  x.  e 
i 

A  r 

n  -cx./0 

V  c  „  l 
1  e 
i=l 


(4.15) 


as  may  be  verified  from  (4.13)  and  letting  x  =  e^.  The  structural 
differences  in  the  estimators  for  0  provided  by  (4.11)  and  (4.15)  are 
interesting  in  their  own  right. 

The  score  functions  for  ac  and  8C  at  the  log-gamma  distribution 
with  o=2,  <J>=0  are  plotted  in  Figure  2  for  several  values  of  c.  Observe 

that  the  score  functions  for  5  and  $  are  both  unbounded  when  c=0. 

c  c 

c.  The  Weibull  Distribution.  The  Weibull  distribution  with  shape 


parameter  k  and  scale  parameter  9  is  given  by 
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w(x|k,8)  =  |  (f)k_1  exp{-  (|)k),  6>0,  k>0.  (4.16) 

It  is  more  convenient  to  deal  with  the  distribution  of  y  =  log  x 
than  with  that  of  x  itself.  The  log-Weibull  (or  Type  I  extreme  value) 
density  is  given  by 


|k,<j>)  =  k  exp[k(y-ij>)  -  exp  (k(y-<f>)}],  -»<y<». 


(4.17) 


where  $  =  log  0.  The  function 


Q*(k.<f>,c)  =  w*+c  (y|k,(j))  dy  = 


T(l+c) 


1+c) 


1+c  • 


(4.18) 


We  thus  obtain  from  (2.16)  or  (2.17)  the  joint  estimating  equations 
for  the  pa  ’ameters  $  and  k,  respectively. 


n 

][  [-  k  +  k  exp{k(y^-$) }]  k  exp{ck(y^-<j>)  -  c  exp{k(y^-i|>) }}  =  0,  (4.19) 

i=l 


l  C(l+c)  {i  +  (yi-4i)-(yi-<|i)exp{k(yi-<)»)}}-  kC  exp[ck(yi-(J>) 
i=l 

-  c  exp{k(y^-$)}}]  =  0  .  (4.20) 

Figure  3  depicts  the  score  functions  S(k,y,wft)  and  S($,y,wft)  at  the  log- 
Weibull  distribution  with  <£=0  and  k=l.  These  functions  are  bounded  and 
redescending  to  zero  when  c>0.  The  estimators  are  thus  qualitatively 
robust  for  fixed  c>0 .  Estimators  for  k  and  0  can  also  be  developed  without 
the  transformation  y  =  log  x  but  we  prefer  those  of  (4.19)  and  (4.20). 

The  efficiencies  of  the  estimators  $  and  k  and  the  asymptotic  corre- 
lation  p($c,kc)  between  the  estimators  are  given  in  Table  4.  It  is 
interesting  to  observe  that  for  a  given  index  c  the  efficiencies  for  the 


Table  4 


Joint  Asymptotic  Relative  Efficiencies  of  the 
Log-Weibull  (or  extreme  value)  Estimators 


c 

eff.  ($) 

eff.  (£) 

asym.  corr 

0.0 

1.0 

1.0 

.313 

0.1 

.987 

.974 

.313 

0.2 

.957 

.917 

.312 

0.3 

.919 

.847 

.311 

0.4 

.877 

.775 

.308 

0.5 

.835 

.705 

.306 

0.6 

.793 

.639 

.303 

0.7 

.754 

.580 

.300 

0.8 

.716 

.527 

.296 

0.9 

.680 

.479 

.293 

1.0 

.647 

.436 

.289 

1.5 

.511 

.283 

.272 

2.0 

.414 

.193 

.256 

3.0 

.290 

.103 

.230 

0.0 

1.0 

1.0 

.313 

-0.1 

.980 

.961 

.314 

-0.2 

.898 

.814 

.320 

-0.3 

.703 

.522 

.349 

-0.4 

.325 

.155 

.475 

.  ^  A 

(<t>,k) 


estimators  of  the  location  parameter  $  and  of  the  scale  parameter  k  are 
very  close  to  those  given  in  Table  1  for  the  location  and  scale  estimators 
of  the  Gaussian  distribution  for  the  same  c.  For  example,  the  efficiency 
of  $  _  is  .835  while  the  efficiency  of  y  is  .838.  This  characteristic 

•  0  »  D 

seems  to  hold  more  generally. 

d.  The  Poisson  Distribution.  The  Poisson  distribution  with  mean  u 
has  frequency  function 

x  -y 

P(x|y)  =  -U— ■ j—  ,  x  =  0,1,2,...  .  (4.21) 

The  development  of  the  self-critical  estimators  is  exactly  as  above. 

However,  in  this  case  the  function 


Q(y ;c)  =  l 
x=0 


P1+C(x|y) 


(4.22) 


cannot  be  evaluated  in  closed  form.  The  estimator  y  of  y  must  involve  an 
iterative  evaluation  of  Q(y;c)  in  the  implicit  equation  (2.16).  If  y  is 
large,  a  normal  approximation  to  the  Poisson  may  be  used,  say  through 
(3.2),  to  reduce  the  computational  effort.  The  exact  implicit  equation 
for  y  is 

f 

x. 


I  P  (x.|y) 
j=l  1 


l  U 

x=0 


(l+c)x-l/(x!)l+c 


l  V 

x=0 


(1+c)x/(xj)l+c 


=  0. 


(4.23) 


e.  Clearly,  equations  (2.16)  and  (2.17)  may  be  applied  to  a  wide 
variety  of  distributions.  For  example,  the  self-critical  procedure  has 
been  successfully  used  for  the  negative  binomial,  the  binomial,  the  Pareto 


and  the  multivariate  normal  distributions  in  addition  to  those  discussed 
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5.  An  Example 

The  data  in  Table  5  represents  the  time  to  death  of  64  infants. 

These  infants  were  ostensibly  the  victims  of  sudden  infant  death  syndrome. 
Apart  from  the  seven  longest  survival  times,  the  Weibull  model  provides 
a  reasonable  model  for  these  data.  However,  since  the  hazard  function 
of  the  extreme  value  distribution  is  strictly  increasing,  the  Weibull 
or  extreme  value  model  would  not  be  appropriate.  Biological  considerations 
suggest  that  the  hazard  function  for  sudden  infant  death  syndrome  should 
be  first  increasing  and  then  decreasing.  The  lognormal  distribution  may 
provide  an  appropriate  statistical  model  for  these  data.  Our  main  purpose, 
however,  is  to  illustrate  the  procedure  we  propose.  Table  6  provides 
the  estimates  of  extreme  value  and  lognormal  parameters  for  various  values 
of  c. 

For  both  the  log-Weibull  and  log-normal  models,  the  variation  in  c 
produces  a  dramatic  change  in  the  response  surface  generated  by  the  para¬ 
meter  estimates.  For  example,  the  estimate  of  k  changes  from  1.36  to 

2 

2.33  as  c  goes  from  zero  to  unity.  The  estimate  of  a  changes  from  .45 
to  .20  as  c  moves  from  zero  to  unity.  The  estimate  of  8  diminishes  from 
118.9  to  85.4  as  c  moves  from  zero  to  unity.  This  sensitivity  of  parameter 
estimates  to  change  in  the  value  of  c  which  reflects  the  way  in  which  the 
information  is  processed  indicates  the  inconsistency  of  both  models  with 
the  data  or  that  some  of  the  data  are  not  consistent  with  the  model. 

Of  course,  a  simple  probability  plot  will  be  equally  effective  in  genera¬ 
ting  the  same  conclusion  in  the  case  of  unstructured  data.  However,  when 
we  are  dealing  with  combined  structure  and  error  models,  inconsistencies 


o  CO  in  rt 


Table  5 


Tines  to  Death  (in  days)  in  an  Epidemiological  Study 


17 

77 

113 

22 

63 

80 

123 

25 

63 

82 

128 

34 

65 

87 

135 

34 

65 

87 

146 

34 

65 

88 

148 

35 

66 

92 

149 

39 

66 

93 

158 

42 

67 

96 

160 

43 

68 

99 

218 

44 

73 

100 

234 

54 

74 

101 

267 

55 

75 

102 

329 

56 

76 

106 

372 

57 

77 

108 

455 

57 

77 

110 

492 

Table  6 


Paremeter  Estimates  for  the  Sudden 
Infant  Death  Syndrome  Data 


log-Weibull(n=64)  lo  g-normal ( n= 6  4 )  log-gamma(n=64)  log-Weibull(n=57) 


- a, - 

§*e* 

4 

J2 

6 

A 

a 

A 

e 

A 

k 

118.9 

1.36 

4.43 

.45 

49.7 

2.16 

89.4 

2.43 

97.2 

1.67 

4.39 

.38 

30.3 

2.97 

88.3 

2.35 

89.1 

2.14 

4.37 

.31 

21.3 

3.94 

87.5 

2.32 

85.4 

2.33 

4.36 

.20 

15.6 

5.19 

85.3 

2.35 
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are  often  hidden  by  the  c=0  analyses.  Many  of  these  inconsistencies  may 
be  uncovered  by  changing  the  way  in  which  the  data  is  processed,  namely, 
by  studying  the  response  of  parameter  estimates  and  observational  weights 
to  variation  in  c. 

The  sensitivity  analysis  identifies  the  largest  seven  and  the 
smallest  three  times  as  being  most  inconsistent  with  the  log-Weibull  or 
log-normal  model.  Early  deaths  are  generally  consistent  with  an  alter¬ 
nate  mode  of  failure  rather  then  the  sudden  infinite  death  syndrome. 
Several  of  the  late  deaths,  including  the  latest,  were  of  a  suspicious 
nature.  When  the  largest  seven  observations  are  removed  and  the  analyses 
repeated,  the  parameter  estimates  and  the  observational  weights  v^c  remain 
much  more  stable  as  do  the  parameter  estimates  for  both  the  log-Weibull 
and  log-normal  models.  This  is  illustrated  for  the  log-Weibull  model  in 
Table  6. 


6.  Some  Regression  Models 

A  great  deal  of  attention  has  been  devoted  to  robust  regression 
in  the  case  in  which  the  underlying  error  distribution  is  assumed  to  be 
approximately  Gaussian.  The  self-critical  procedure  we  introduced  in 
section  2  can  be  expected  to  be  very  useful  when  there  are  inconsistencies 
between  model  and  data  in  the  y-direction.  Difficulties  in  the  x-direction, 
that  is  in  the  factor  space,  will  require  augmentation  of  the  procedures 
given  in  section  2.  We  shall  briefly  examine  the  cases  in  which  the  error 
distribution  is  Gaussian,  extreme  value  (or  log-Weibull),  and  Poisson. 


i 


First  consider  the  regression  model 


yij  =  h(-i>-)  +  zij  (6‘ 

where  x.  =  (x.. ,x.„, . . . ,x.  )  is  the  ith  set  of  values  of  the  m  indepen- 
-i  ii  iz  lm  r 

dent  variables,  n^  is  the  number  of  replicates  of  the  ith  experimental 

T 

condition,  9  =  (®i»®2****,®p)  a  vector  of  unknown  parameters, 

is  a  particular  realization  of  the  experiment,  and  z„  are  the  error 

terms.  The  regression  function  h(x,0)  relates  the  expected  value  of  the 

dependent  variable  to  the  independent  variables  and  the  parameters,  and 

given  the  x^  and  the  y\  . ,  we  wish  to  estimate  9.  Assume  first  that  the 

2 

z„  are  approximately  Gaussian  with  mean  0  and  variance  a  .  Then  given 

2 

the  x^,  we  choose  as  parameter  estimates  those  values  of  9  and  a  which 
maximize 


7  l  l 
i  i  I 


nc(z„  |g,x,q2) 

[Q(9,»2i  =  )Jc/U+c) 


where 

n(z.  .  |9,x,o2)  =  (2itcj2)~35  exp( - (y. .-h(x. ,9) )2},  (6.3 

-1  2a  -1 

0  On  -it 

and  Q(9,ct  ;c)  =  [(l+c)(2ira  )  ]  as  in  (3.1).  The  joint  estimators 
2 

of  9  and  o  for  various  values  of  c  will  allow  us  to  perform  a  sensitivity 
analysis. 

Type  I  extreme  value  regression  arises  as  a  natural  consequence 
of  the  parametric  proportional  hazards  model  of  Cox  (1972).  In  this 
case  the  z^  of  (6.1)  will  follow  the  distribution  (4.15).  Estimators 
for  9  and  k  will  be  determined  from  maximization  of 
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where 


w£(zi;.  |0,x,k) 
:Q(8,ki=:c/(lre) 


1 


(6.4) 


wft(z„  |§,x,k)  =  k  exp[k(yij-f(x^,6) )  -  expCkCy^-fCx^e) )],  (6.5) 

and  Q(0,k;c)  is  given  by  (4.15).  We  have  found  extreme  value  regression 
to  be  a  useful  alternative  to  the  usual  Gaussian  regression  in  several 
engineering  applications  where  an  analysis  of  the  hazard  structure 
dictated  a  model  other  than  Gaussian.  The  ability  to  perform  a  sensi¬ 
tivity  analysis  or  a  robust  analysis  in  this  case  has  been  of  considerable 
use  in  model  evolution  as  well. 

If  we  constrain  f(x^,0)  to  be  positive  we  may  also  develop  in  a 
straightforward  fashion  a  sensitivity  analysis  or,  for  fixed  c>0,  a  robust 
analysis  for  Poisson  regression.  Finally,  the  procedure  of  section  2 
produces  attractive  and  easy  to  use  procedures  for  experimental  design. 

We  shall  consider  the  regression  and  design  topics  elsewhere. 


7.  Asymptotic  Covariances  and  Efficiencies 

The  self -critical  estimators  are  M-estimators  and  as  such  are 
consistent  and  asymptotically  normal  under  regularity  conditions  similar 
to  those  of  regular  maximum  likelihood  estimators.  Just  as  in  the  case 
of  maximum  likelihood,  c=0,  we  cannot  always  be  sure  that  there  will 
be  a  unique  solution  for  the  estimating  equations.  From  among  the  local 
optima,  the  consistent  solution  is  assumed  to  be  the  one  taken. 
Asymptotic  variance-covariance  matrices  of  the  estimators  are  readily 
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determined  from  (2.16)  or  (2.17)  by  standard  expansion  arguments.  Let 

xl’x2’"*’xn  a  random  sample  from  the  density  or  frequency  function 
i  T 

f(x|0)  where  0  =  •  •  »9p)  is  a  pxi  vector  of  parameters.  Let 


fc(x|g) 

[Q(0;c)]°/(c+1) 


(7.1) 


where  the  random  variable  X  has  the  same  distribution  as  the  x  . 

1 

the  asymptotic  variance -covariance  matrix  for  the  estimator  0  is 

~  c 

determined  from  the  matrices  g  and  £  whose  elements  are  given  by 


Then 


00' 


E 


!f=x\ 

30  30'  /  ’ 


(7.2) 


H00'  =  E(3030'  )  »  (7,3) 

where  0,  0'  =  . .  ,0p.  The  asymptotic  covariance  matrix  of  the 

estimator  0  is 
~c 

l  =  n_1  H-1  V  H_1.  (7.4) 

From  (2.15)  we  have 


r fitc  i  <«•)  -  iitp)  «*  ■  <•* 

-CO 

differentiating  this  expression  with  respect  to  0*  we  find 


I*  [(1.0  ii§M  |(i.o 


3  log  f  3  log  Q 
30  30 


+  f 


1+c 


2 

3  log 


3030 


M|J 


dx  = 


(d+c) 

0 


3 2  log  f 
3030' 


t 
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which  implies 


I  c  3  log  f  |  3  log  f  1  3  log  Q  |  .  1  „  T „c  I 

Elf  30 '  I  30  1+c  - 30  I - E  l f  | 


3  log  f 
3030' 


1  3  log  Q 

(1+c)  3030’ 


^1] 


(7.5) 


a  generalization  of  Fisher's  information  identity.  We  thus  find  that 

r  °  V 1  -i  l  _ r  *. _ a  .  i 


while 
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The  expressions  (7.6)  and  (7.7)  are  often  easy  to  evaluate.  Equation 
(7.4)  has  been  used  to  produce  the  efficiencies  quoted  in  section  3  and 
4.  Both  (7.6)  and  (7.7)  are  easy  to  approximate  in  the  practical  setting 
where  the  true  value  of  0  is  not  known,  for  example 


3 2  l 

n  cx. 


^  —i  y  j  [ 

H00'  ~  n  .£  3030'  j g=Q  ’  ( 

3—1  c 

or  the  estimators  §  may  be  substituted  directly  in  the  expression  for 
expectations. 

Based  on  simulation  experience,  the  estimators  §c  approach  their 
asymptotic  distributions  very  rapidly,  often  for  n  as  small  as  10  and 
20.  This  is  due  to  the  smoothing  induced  by  the  term  fC  in  (2.16). 


(7.8) 
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8.  Additional  Families  of  Estimators  and  Discussion 


Let  f(x|0)  be  continuous  density  defined  over  the  nontrivial 

interval  of  support  (-«,<»)  and  let  0  take  on  values  in  an  open  set  0. 

Let  g(x|e)  be  a  function  of  x  over  -«<x<®,  0e0.  Suppose  further  that 

there  exits  a  function  k  (0)  such  that 

g 


I  f( x| 0 )  g(x|0)  dx  =  kg(0). 


(8.1) 


We  have  then  that  the  inner  product  of  f  with  g  is  normalized  by 


k  ( 0 ) ,  i.e. 


f(x| 8 )  g(x|8 ) 

k  (6)  “ 

g 


Under  mild  regularity  conditions 


(8.2) 


30 
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8  1-8 


If  x1,x2,...,xn  is  a  random  sample  from  f(x|0),  then  an  estimator  for 
0  may  be  determined  from  the  consistent  zero  of  (see  (2.15)) 
n  .  3  log  f(x. |0)  3  log  g(x. (0)  3  log  k  , 

I — or1—  * — - rr^l  ">•  (8-“» 


Because  we  have  taken  g(x*J@)  to  be  fc(x;(0),  in  sections  2-6,  we  have 
termed  the  estimators  of  this  paper  self-critical.  The  choice  fc(x^|0) 
seems  to  be  most  useful  in  practice  although  the  estimators  determined 
from  (8.4)  may  prove  to  be  useful  in  some  contexts. 


i 


^x. . 
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The  procedure  proposed  in  this  paper  is  envisioned  to  be  of 
greatest  use  in  sensitivity  analysis  and  model  evolution  settings.  It 
also  provides  a  tool  for  the  identification  of  outliers  in  the  structural 
data  case  since  the  model  plays  a  direct  role  (through  the  f  term  in 
(2.16)  or  (2.17))  in  the  processing  of  information.  Finally,  we  have 
provided  a  general,  easily  used,  computationally  attractive  procedure 
which  steins  from  a  single  primitive  concept  for  the  construction  of  simul¬ 
taneous  robust  estimators. 
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